Goto

Collaborating Authors

 minimax optimal




Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds

Neural Information Processing Systems

While numerous works have focused on devising efficient algorithms for reinforcement learning (RL) with uniformly bounded rewards, it remains an open question whether sample or time-efficient algorithms for RL with large state-action space exist when the rewards are \emph{heavy-tailed}, i.e., with only finite $(1+\epsilon)$-th moments for some $\epsilon\in(0,1]$. In this work, we address the challenge of such rewards in RL with linear function approximation.


Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms

Neural Information Processing Systems

We study the problem of least squares linear regression where the datapoints are dependent and are sampled from a Markov chain. We establish sharp information theoretic minimax lower bounds for this problem in terms of $\tmix$, the mixing time of the underlying Markov chain, under different noise settings. Our results establish that in general, optimization with Markovian data is strictly harder than optimization with independent data and a trivial algorithm (SGD-DD) that works with only one in every $\tmix$ samples, which are approximately independent, is minimax optimal. In fact, it is strictly better than the popular Stochastic Gradient Descent (SGD) method with constant step-size which is otherwise minimax optimal in the regression with independent data setting. Beyond a worst case analysis, we investigate whether structured datasets seen in practice such as Gaussian auto-regressive dynamics can admit more efficient optimization schemes. Surprisingly, even in this specific and natural setting, Stochastic Gradient Descent (SGD) with constant step-size is still no better than SGD-DD. Instead, we propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate. Our improved rate serves as one of the first results where an algorithm outperforms SGD-DD on an interesting Markov chain and also provides one of the first theoretical analyses to support the use of experience replay in practice.


Horizon-Independent Minimax Linear Regression

Neural Information Processing Systems

We consider online linear regression: at each round, an adversary reveals a covariate vector, the learner predicts a real value, the adversary reveals a label, and the learner suffers the squared prediction error. The aim is to minimize the difference between the cumulative loss and that of the linear predictor that is best in hindsight.


Horizon-Independent Minimax Linear Regression

Neural Information Processing Systems

We consider online linear regression: at each round, an adversary reveals a covariate vector, the learner predicts a real value, the adversary reveals a label, and the learner suffers the squared prediction error. The aim is to minimize the difference between the cumulative loss and that of the linear predictor that is best in hindsight.




Minimax

Neural Information Processing Systems

We thank reviewers for appreciating the originality of our work and providing constructive feedback. We address specific concerns below. Random selection in Alg. 1 means sampling uniformly The intuition behind Thm. 2 in explained But to interpret Thm. 2 alone: for any algorithm considered, if There is no missing factor of 2 in Eq.(28) and Eq.(26) Thm. 3 is as following: for any Pareto optimal rate Alg. 1 is thus Pareto optimal. Eq. after line 115 defines the hardness level of a given problem, Alg. 1 is different from the Distilled Note that we are also comparing to an algorithm, i.e., QRM2, that allows the reuse of statistics [12]. The lower bound in Section 2 is in the minimax sense, so it suffices to reduce to the single-best arm case.


Robust Density Estimation under Besov IPM Losses

Neural Information Processing Systems

As shown in several recent papers [Liu et al., 2017, Liang, 2018, Singh et al., 2018, Uppal Namely, the estimators used in past work rely on the unrealistic assumption that the practitioner knows the Besov space in which the true density lies. The rest of this paper is organized as follows.